Automatic detection of anglicisms for the pronunciation dictionary generation: a case study on our German IT corpus

نویسندگان

  • Sebastian Leidig
  • Tim Schlippe
  • Tanja Schultz
چکیده

With the globalization more and more words from other languages come into a language without assimilation to the phonetic system of the new language. To economically build up lexical resources with automatic or semi-automatic methods, it is important to detect and treat them separately. Due to the strong increase of Anglicisms, especially from the IT domain, we developed features for their automatic detection and collected and annotated a German IT corpus to evaluate them. Furthermore we applied our methods to Afrikaans words from the NCHLT corpus and German words from the news domain. Combining features based on grapheme perplexity, grapheme-to-phoneme confidence, Google hits count as well as spell-checker dictionary and Wiktionary lookup reaches 75.44% fscore. Producing pronunciations for the words in our German IT corpus based on our methods resulted in 1.6% phoneme error rate to reference pronunciations, while applying exclusively German grapheme-to-phoneme rules for all words achieved 5.0%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessment of non-native phones in anglicisms by German listeners

By means of a pair comparison test, preferences of German native speakers for English or German sounds in spoken anglicisms were investigated. The collected data can be used as a reference point, which of the English xenophones have to be integrated into a German TTS system to allow for an appropriate pronunciation of anglicisms in German.

متن کامل

Automatic Error Recovery for Pronunciation Dictionaries

In this paper, we present our latest investigations on pronunciation modeling and its impact on ASR. We propose completely automatic methods to detect, remove, and substitute inconsistent or flawed entries in pronunciation dictionaries. The experiments were conducted on different tasks, namely (1) word-pronunciation pairs from the Czech, English, French, German, Polish, and Spanish Wiktionary [...

متن کامل

Phonetic Transcriptions for the New Dictionary of Italian Anglicisms

This paper describes the work that has been done concerning the phonetic transcriptions for the New Dictionary of Italian Anglicisms directed by Prof. Pulcini (University of Turin): the dictionary contains both transcriptions of how Italians pronounce anglicisms and of how the corresponding English words are pronounced by native speakers of English. We shall explain how different pronunciation ...

متن کامل

Phonetic tool for the Tunisian Arabic

A phonetic dictionary is an essential component of a speech recognition system or a speech synthesis system. Our work targets the generation of an automatic pronunciation dictionary for the Tunisian Arabic, in particular in the field of rail transport. To do this, we created two tools of phonetic vowelized and unvowelized words in the Tunisian Arabic. The proposed method to automatically genera...

متن کامل

Automatic generation and pruning of phonetic mispronunciations to support computer-aided pronunciation training

This paper presents a mispronunciation detection system which uses automatic speech recognition to support computer-aided pronunciation training (CAPT). Our methodology extends a model pronunciation lexicon with possible phonetic mispronunciations that may appear in learners’ speech. Generation of these pronunciation variants was previously achieved by means of phone-tophone mapping rules deriv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014